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Abstract 

This  paper  describes  a  system  for  generating  natural-language  sentences  from  an  interlin¬ 
gual  representation.  Lexical  Conceptual  Structure  (LCS).  The  system  has  been  developed 
as  part  of  a  Chinese-English  Machine  Translation  system;  however,  it  is  designed  to  be 
used  for  many  other  MT  language  pairs  and  Natural  Language  applications.  The  con¬ 
tributions  of  this  work  include:  (1)  Development  of  a  language-independent  generation 
system  that  maximizes  efficiency  through  the  use  of  a  hybrid  rule-based/statiscal  mod¬ 
ule;  (2)  Enhancements  to  an  interlingual  representation  and  associated  algorithms  for 
interpretation  of  multiply  ambiguous  input  sentences;  (3)  Development  of  an  efficient 
reusable  language-independent  linearization  module  with  a  grammar  description  lan¬ 
guage  that  can  be  used  with  other  systems;  (4)  Improvements  to  an  earlier  algorithm  for 
hierarchically  mapping  thematic  roles  to  surface  positions;  (5)  Development  of  a  diagnos¬ 
tic  tool  for  lexicon  coverage  and  correctness  and  use  of  the  tool  for  verification  of  English, 
Spanish,  and  Chinese  lexicons.  An  evaluation  of  translation  quality  shows  comparable 
performance  with  commercial  translation  system.  The  generation  system  can  also  be 
straightforwardly  extended  to  ther  languages  and  this  is  demonstrated  and  evaluated  for 
Spanish.  Keywords:  Generation,  Machine  Translation,  Interlingua,  Lexical  Conceptual 
Structure,  Language-Independent  NLP. 
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Abstract.  This  paper  describes  a  system  for  generating  natnral-langnage  sentences 
from  an  interlingnal  representation,  Lexical  Conceptnal  Strnctnre  (LCS).  The  sys¬ 
tem  has  been  developed  as  part  of  a  Chinese-English  Machine  Translation  system; 
however,  it  is  designed  to  be  nsed  for  many  other  MT  langnage  pairs  and  Natnral 
Langnage  applications.  The  contribntions  of  this  work  inclnde:  (1)  Development  of 
a  langnage-independent  generation  system  that  maximizes  efficiency  throngh  the 
nse  of  a  hybrid  rnle-based/statistical  modnle;  (2)  Enhancements  to  an  interlingnal 
representation  and  associated  algorithms  for  interpretation  of  mnltiply  ambignons 
inpnt  sentences;  (3)  Development  of  an  efficient  rensable  langnage-independent  lin¬ 
earization  modnle  with  a  grammar  description  langnage  that  can  be  nsed  with 
other  systems;  (4)  Improvements  to  an  earlier  algorithm  for  hierarchically  mapping 
thematic  roles  to  snrface  positions;  (5)  Development  of  a  diagnostic  tool  for  lexicon 
coverage  and  correctness  and  nse  of  the  tool  for  verihcation  of  English,  Spanish, 
and  Chinese  lexicons.  An  evalnation  of  translation  qnality  shows  comparable  per¬ 
formance  with  a  commercial  translation  system.  The  generation  system  can  also 
be  straightforwardly  extended  to  other  langnages  and  this  is  demonstrated  and 
evalnated  for  Spanish. 

Keywords:  Generation,  Machine  Translation,  Interlingna,  Lexical  Conceptnal  Strnc¬ 
tnre,  Langnage-independent  NLP 


1.  Introduction 

This  paper  describes  a  system  for  generating  natnral-langnage  sen¬ 
tences  from  an  interlingnal  representation,  Lexical  Conceptnal  Strnc¬ 
tnre  (LCS).  The  system  has  been  developed  as  part  of  a  Chinese-English 
Machine  Translation  (MT)  system;  however,  it  is  designed  to  be  nsed  for 
many  other  MT  langnage  pairs  (e.g.,  Spanish  and  Arabic  (Dorr  et  ah, 
1995))  and  other  natnral  langnage  applications  (e.g.,  cross-langnage 
information  retrieval  (Dorr  et  ah,  2000)). 

The  contribntions  of  this  work  inclnde:  (1)  Development  of  a  langnage- 
independent  generation  system  that  maximizes  efficiency  throngh  the 
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use  of  a  hybrid  rule-based/statistical  module;  (2)  Enhancements  to  an 
interlingual  representation  and  associated  algorithm  (Dorr,  1993b)  for 
interpretation  of  multiply  ambiguous  input  sentences;  (3)  Development 
of  an  efficient  reusable  language  independent  bnearization  module  with 
a  grammar  description  language  that  can  be  used  with  other  systems; 
(4)  Improvements  to  an  earher  algorithm  (Dorr  et  ah,  1998)  for  hierar¬ 
chically  mapping  thematic  roles  to  surface  positions;  (5)  Development 
of  a  diagnostic  tool  for  lexicon  coverage  and  correctness  and  use  of 
the  tool  for  verification  of  Enghsh,  Spanish,  and  Chinese  lexicons.  An 
evaluation  of  translation  quahty  shows  comparable  performance  with 
a  commercial  translation  system.  The  generation  system  can  also  be 
straightforwardly  extended  to  other  languages  and  this  is  demonstrated 
and  evaluated  for  Spanish. 

We  will  provide  an  overview  of  LCS-based  MT  and  then  describe 
our  interlingual  representation.  We  will  then  examine  the  generation 
component  of  our  MT  system  in  detail,  followed  by  an  evaluation  of 
different  aspects  of  our  system. 


2.  Overview  of  LCS-based  Machine  Translation 

One  of  the  major  challenges  in  natural  language  processing  is  the  ability 
to  make  use  of  existing  resources.  Large  differences  in  syntax,  seman¬ 
tics,  and  ontologies  of  such  resources  create  significant  barriers  to  their 
usage  in  large-scale  apphcations.  A  case  in  point  is  the  wide  range  of 
“interlingual  representations”  used  in  machine  translation  and  cross¬ 
language  processing.  Such  representations  are  becoming  increasingly 
prevalent,  yet  views  vary  widely  as  to  what  these  should  be  composed 
of,  varying  from  purely  conceptual  knowledge-representations,  having 
little  to  do  with  the  structure  of  language,  to  very  syntactic  represen¬ 
tations,  maintaining  most  of  the  idiosyncrasies  of  the  source  languages. 
In  our  generation  system  we  make  use  of  resources  associated  with  two 
different  (kinds  of)  interhngua  structures:  Lexical  Conceptual  Struc¬ 
ture  (LCS),  and  the  Abstract  Meaning  Representations  (AMR)  used 
at  USC/lSl  (Langkilde  and  Knight,  1998a).  The  two  representations 
serve  different  but  complementary  roles  in  the  translation  process.  The 
deeper  lexical-semantic  expressiveness  of  LCS  is  essential  for  language 
independent  lexical  selection  that  transcends  translation  divergences 
(Dorr,  1993a).  The  shallower  yet  mixed  semantic-syntactic  nature  of 
AMRs  makes  it  easier  to  use  directly  for  target-language  realization. 

The  use  of  two  representations  in  generation  mirrors  the  use  of  two 
representations  on  the  analysis  side  of  the  MT  system,  in  which  a 
parsing  output  is  passed  to  a  semantic-composition  module;  the  target- 
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language  AMR  is  analogous  to  the  source-language  parse  tree.  (See 
Figure  1.)  The  Composition  module  takes  the  source- language  parse 
tree  and  creates  a  deeper  semantic  representation  (the  LCS)  using 
a  source-language  lexicon.  In  generation,  the  Decomposition  module 
performs  a  reverse  step  that  uses  a  target-language  lexicon  to  create 
the  hierarchical  word  and  feature  structure,  a  “parse-hke”  AMR.  The 
linearization  module  flattens  an  AMR  into  a  sequence  of  words.  Because 
of  the  ambiguity  inherent  in  all  of  the  involved  modules  from  the  parser 
to  the  lexicons,  multiple  sequences  are  created.  We  use  the  statisti¬ 
cal  Extraction  module  of  the  generation  system  Nitrogen  (Langkilde 
and  Knight,  1998a;  Langkilde  and  Knight,  1998b)  to  select  among 
alternative  outputs,  using  n-gram  probabihties  of  target-language  word 
sequences. 


Figure  1.  LCS-based  Machine  Translation 


3.  Lexical  Conceptual  Structure 

Linguistic  knowledge  in  the  lexicon  covers  a  wide  range  of  information 
types,  such  as  verbal  subcategorization  for  events  (e.g.,  that  a  transi¬ 
tive  verb  such  as  “hit”  occurs  with  an  object  noun  phrase),  featural 
information  (e.g.,  that  the  direct  object  of  a  verb  such  as  “frighten”  is 
animate),  thematic  information  (e.g.,  that  “John”  is  the  agent  in  “John 
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hit  the  ball”),  and  lexical- semantic  information  (e.g.,  that  spatial  verbs 
snch  as  “throw”  are  conceptnally  distinct  from  verbs  of  possession  snch 
as  “give”).  By  modnlarizing  the  lexicon,  we  treat  each  information  type 
separately,  thns  allowing  ns  to  vary  the  degree  of  dependence  on  each 
level. 

The  most  intricate  component  of  lexical  knowledge  is  the  lexical- 
semantic  information,  which  is  encoded  in  the  form  of  Lexical  Con- 
ceptnal  Strnctnre  (LCS)  as  formnlated  by  Dorr  (Dorr,  1993b;  Dorr, 
1994)  based  on  work  by  Jackendoff  (Jackendoff,  1983;  Jackendoff,  1990; 
Jackendoff,  1996).  LCS  is  a  compositional  abstraction  with  langnage- 
independent  properties  that  transcend  strnctnral  idiosyncrasies.  This 
representation  has  been  nsed  as  the  interlingna  of  several  projects  snch 
as  UNITRAN  (Dorr,  1993a)  and  MILT  (Dorr,  1997a). 

Formally,  an  LCS  is  a  directed  graph  with  a  root.  Each  node  is 
associated  with  certain  information,  inclnding  a  ft/pe,  a  primitive  and 
a  field.  The  type  of  an  LCS  node  is  one  of  Event,  State,  Path,  Manner, 
Property  or  Thing.  There  are  two  general  classes  of  primitives:  closed 
class  (also  called  structural  primitives,  e.g.,  CAUSE,  GO,  BE,  TO)  and 
open  class  primitives  (also  called  constants,  e.g.,  john+,  reduce+ed, 
jog+ingly).  Snffixes  snch  as  +,  +ed,  +ingly  are  markers  of  open  class 
primitives,  signaling  also  the  type  of  the  primitive  (thing,  property, 
event,  etc.).  We  distingnish  between  the  strnctnral  primitive  GO  and 
the  constant  go+ingly:  the  hrst  appears  in  many  lexical  entries  bnt  the 
second  appears  only  in  specihc  lexical  entries  snch  as  the  one  for  the  En¬ 
glish  verb  “go”.  Examples  of  helds  inclnde  Locational,  Possessional, 
and  Identif  icational.  Strnctnrally,  an  LCS  node  has  zero  or  more 
LCS  children.  There  are  three  ways  a  child  node  relates  to  its  parent: 
as  a  snbject  (maximally  one),  as  an  argnment,  or  as  a  modiher. 

An  LCS  captnres  the  semantics  of  a  lexical  item  throngh  a  com¬ 
bination  of  semantic  strnctnre  (specihed  by  the  shape  of  the  graph 
and  its  strnctnral  primitives  and  helds)  and  semantic  content  (specihed 
throngh  constants).  The  semantic  strnctnre  of  a  verb  is  something  the 
verb  shares  with  a  semantic  verb  class  whereas  the  content  is  specihc  to 
the  verb  itself.  For  example,  all  the  verbs  in  the  semantic  class  of  “Rnn” 
verbs  have  the  same  semantic  strnctnre  bnt  vary  in  their  semantic 
content  (for  example,  rnn,  jog,  walk,  zigzag,  jnmp,  roll,  etc.).  Semantic 
verb  classes  were  initially  borrowed  from  the  classihcation  in  Enghsh 
Verb  Classes  and  Alternations  (EVCA)  (Levin,  1993).  Onr  LCS  Verb 
Database  (LVD)  extends  EVCA  by  rehning  the  class  divisions^  and 
dehning  the  nnderlying  meaning  components  of  each  class  in  the  LCS 


^  Levin’s  original  database  contained  192  classes,  nnmbering  between  9.1  and  57; 
onr  refined  version  contains  492,  with  more  specific  identifiers  snch  as  “51.3.2.a.ii”. 
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representation.  LVD  also  provides  a  relation  between  Levin’s  classes 
and  both  thematic  role  information  and  hand-tagged  WordNet  synset 
nnmbers.  The  hrst  pnblic  release  of  the  LCS  Verb  Database  is  now 
available  for  research  pnrposes  (Dorr,  2001). 

Consider  the  sentence  John  jogged  to  school.  This  can  be  fnUy  rep¬ 
resented  (except  for  featnres  snch  as  tense,  telicity,  etc.)  as  follows, 
ronghly  corresponding  to  ‘John  moved  (location)  to  the  school  in  a 
jogging  manner’: 

(1)  (event  go  loc 

(thing  john+) 

(path  to  loc 

(thing  john+) 

(position  at  loc  (thing  john+)  (thing  school+))) 
(manner  jog+ingly)) 

The  lexicon  entry  for  one  sense  of  the  English  verb  ‘jog’  and  the 
preposition  ‘to’  are  shown  in  Fignre  2.  These  entries  inclnde  the  root 
form  of  the  word,  its  semantic  verb  class  and  word  sense(s)  from  Word- 
Net  (FeUbanm,  1998)  (for  the  verbs),  and  most  importantly,  a  Root 
LCS  (RLCS)  which  is  the  nninstantiated  LCS  corresponding  to  the 
nnderlying  meaning  of  the  word  entry  in  the  lexicon. 

The  top  node  in  the  “jog”  RLCS  has  the  strnctnral  primitive  GO  in 
the  locational  held.  Its  snbject  is  marked  with  a  star-marked  nodes 
mnst  be  hlled  recnrsively  with  other  lexical  entries  dnring  semantic 
composition.  The  restriction  on  this  particnlar  LCS  node  is  that  the 
hller  mnst  be  of  type  thing.  The  nnmber  ‘2’  in  that  node  specihes  the 
thematic  role:  in  this  case,  theme.  The  second  and  third  child  nodes  are 
in  argnment  positions  hlled  with  the  primitives  FROM  and  TO.  The  nnm¬ 
bers  ‘3’  and  ‘5’  stand  for  source  particle  and  goal  particle  respectively. 
The  nnmbers  ‘4’  and  ‘6’  stand  for  source  and  goal.  Fignre  3  contains 
a  list  of  variable  nnmbers  with  their  associated  thematic  roles.  The 
second  argnment  in  the  “jog”  RLCS  is  the  snbstrnctnre  (to  loc  .  .  .) 
that  nnihes  with  the  RLCS  for  the  preposition  “to”.  This  secondary 
RLCS  itself  has  a  star-marked  argnment  that  mnst  be  instantiated 
with  a  thing  snch  as  “school”. 

The  held  :THETA_ROLES  specihes  the  set  of  thematic  roles  appear¬ 
ing  in  the  RLCS  entry.  Theta  roles  preceded  by  an  nnderscore  (_) 
are  obligatory;  whereas  roles  proceeded  by  a  comma  ( , )  are  optional. 
Parentheses  indicate  that  the  corresponding  phrases  mnst  necessarily 
be  headed  by  a  preposition.  Sometimes  the  specihc  preposition  is  pro¬ 
vided  inside  the  parentheses.  The  roles  are  ordered  in  a  canonical  order 
that  rehects  their  relative  snrface  order:  hrst  available  role  is  snbject; 
second  is  object;  etc. 


mtj2.tex;  5/09/2001;  12:10;  p.5 


6 


(DEFIIE-WORD 
:DEF_WORD  "jog" 

:CLASS  "51. 3.2. a. ii" 

: THETA_ROLES  " _th , sr c () , goal ( ) " 

:WI_SEISE  (01315785  01297547) 

:LAIGUAGE  EIGLISH 
:LCS 

(event  go  loc  (*  thing  2) 

((*  path  from  3)  loc  (thing  2) 

(position  at  loc  (thing  2)  (thing  4))) 

((*  path  to  5)  loc  (thing  2) 

(position  at  loc  (thing  2)  (thing  6))) 
(manner  jog+ingly  26)) 

:VAR_SPEC  ((3  : optional)  (5  : optional))) 

(DEFIIE-WORD 
:DEF_W0RD  "to" 

:LAIGUAGE  EIGLISH 
:LCS  (path  to  loc 
(thing  2) 

(position  in  loc  (thing  2)  (*  thing  6)))) 


Figure  2.  Lexicon  Entries  for  jog  and  to 


The  field  :WN_SENSE  links  the  entry  to  its  corresponding  Word- 
Net  synset.  The  Lexicon  entries  nse  WordNet  1.6  senses  (Fellbanm, 
1998;  Miller  and  Fellbanm,  1991).  The  variable  specifications  (indicated 
here  as  :VAR_SPEC)  assign  the  argnments  headed  by  FROM  and  TO  an 
:  optional  stains.  Other  possible  variable  specifications  that  appear  in 
onr  lexicon  inclnde  :obligatory,  :promote,  :demote,  :EXT  (external), 
:IMT  (internal)  and  : conflated  (see  (Dorr,  1993a)  for  more  details). 

The  cnrrent  English  lexicon  contains  over  11000  RLCS  entries  snch 
as  those  in  Fignre  2  (see  also  Fignre  8  later).  These  entries  correspond 
to  different  senses  of  over  4000  verbs.  Fignre  4  compares  fonr  of  the  nine 
RLCS  entries  for  the  verb  “rnn”.  These  entries  are  classified  by  verb 
class.  Verb-classes  are  nsed  as  templates  to  generate  the  RLCS  entries 
of  verbs  in  the  class.  For  example,  the  lexical  entry  for  “bake”  in  class 
26.3  wonld  be  identical  to  the  top  RLCS  entry  shown  in  Fignre  4,  except 
that  node  9  wonld  instead  contain  the  primitive  bake+ed  rather  than 
run+ed. 

As  described  in  (Dorr,  1993b),  the  meaning  of  complex  phrases  is 
captnred  throngh  a  composed  LCS  (CLCS).  A  CLCS  is  constrncted 
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1  * 

1  Thematic  Role 

1  Definition  | 

0 

no  thematic  role  assigned 

1 

AG 

agent 

2 

TH  ,EXP  ,INFO 

theme  or  experiencer  or  information 

3 

SRC() 

source  preposition 

4 

SRC 

source 

5 

GOAL(),  PRED() 

goal  or  pred  preposition 

6 

GOAL 

goal 

7 

PERC() 

perceived  item  particle 

8 

PERC 

perceived  item 

9 

PRED 

identificational  predicate 

10 

LOC() 

locational  particle 

11 

LOG 

locational  predicate 

12 

POSS 

possessional  predicate 

13 

TIME() 

temporal  particle  preceding  time 

14 

TIME 

time  for  TEMP  field 

15 

MOD-POSS() 

possessional  particle 

16 

MOD-POSS 

possessed  item  modifier 

17 

BEN() 

beneficiary  particle 

18 

BEN 

benefactive  modifier 

19 

INSTR() 

instrumental  particle 

20 

INSTR 

instrument  modifier 

21 

PURP() 

purpose  particle 

22 

PURP 

purpose  modifier  or  reason 

23 

MOD-LOC() 

location  particle 

24 

MOD-LOC 

location  modifier 

25 

MANNERO 

manner 

26 

reserved  for  conflated  manner 

27 

PROP 

event  or  state 

28 

MOD-PROP 

event  or  state 

29 

MOD-PRED() 

identificational  particle 

30 

MOD-PRED 

property  modifier 

31 

MOD-TIME 

time  modifier 

Figure  3.  Inventory  of  Thematic  Roles 


(or  composed)  from  several  RLCS  entries  corresponding  to  individnal 
words.  The  composition  process  starts  with  a  parsed  tree  of  the  in- 
pnt  sentence  and  maps  syntactic  leaf  nodes  into  RLCS  entries  whose 
argnment  positions  are  hlled  with  other  RLCS  entries.  For  example, 
the  two  RLCS  entries  we  have  seen  already  can  compose  together 
with  the  constants  for  “John”  and  “school”  to  give  the  CLCS  for  the 
sentence:  John  jogged  to  school,  shown  in  (f).  The  star-marked  node  (* 
path  from  3)  is  optional,  and  is  left  nnhlled  in  this  case.  The  same 
RLCS  conld  also  be  nsed  to  compose  different  CLCS  representations 
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26.3  Verbs  of  Preparing 

(event  cause  (*  thing  1) 

(event  go  ident  (*  thing  2) 

(path  toward  ident  (thing  2) 

(position  at  ident  (thing  2)  (property  run+ed  9)))) 

((*  for  17)  poss  (*head*)  (*  thing  18))) 

Example:  John  ran  the  store  for  Mary. 

Other  verbs:  bake  boll  dean  cook  fix  fry  grill  iron  mix  prepare  roast  roll 

run  wash  . . . 

47. 7. a  Meander  Verbs  (from  to) 

(event  go_ext  loc  (*  thing  2) 

((*  path  from  3)  loc  (thing  2)  (position  at  loc  (thing  2)  (thing  4))) 

((*  path  to  5)  loc  (thing  2)  (position  at  loc  (thing  2)  (thing  6))) 

(manner  run+ingly  26)) 

Example:  The  river  runs  from  the  lake  to  the  sea. 

Other  Verbs:crawl  drop  go  meander  plunge  run  sweep  turn  twist  wander 

47.5.1.b  Swarm  Verbs  (Locational) 

(event  act  loc  (*  thing  2) 

((+  position  [at]  10)  loc  (thing  2)  (thing  11)) 

(manner  run+ingly  26)) 

Example:  The  dogs  run  in  the  forest. 

Other  verbs:  bustle  crawl  creep  run  swarm  swim  teem  ... 

51. 3. 2. a. i  Rnn  Verbs  -  (Locational, Theme  only) 

(event  go  loc  (*  thing  2) 

((*  path  from  3)  loc  (thing  2)  (position  [at]  loc  (thing  2)  (thing  4))) 

((*  path  to  5)  loc  (thing  2)  (position  [at]  loc  (thing  2)  (thing  6))) 

(manner  run+ingly  26)) 

Example:  The  horse  ran  into  the  field  from  the  barn. 

Other  Verbs:climb  crawl  Ay  jog  jump  leap  race  run  swim  walk  ... 

Figure  4-  RLCS  entries  for  “run”  in  4  different  semantic  verb  classes 


(in  combination  with  other  RLCS  entries)  to  prodnce  sentences  hke 
John  jogged  from  home  or  John  jogged  from  home  to  school. 

A  CLCS  can  also  be  decomposed  on  the  generation  side  in  different 
ways  depending  on  the  RLCS  entries  from  the  target  langnage.  Fignre  5 
nses  a  compressed  graphic  representation  of  LCS  to  visnaUy  compare 
three  different  decompositions  in  three  langnages  of  a  single  CLCS.  The 
CLCS  generated  can  be  paraphrased  as  John  caused  himself  to  go  to 
the  inside  of  a  room  in  a  forceful  manner 
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("break  J 


[  John  )  [  into  ] 


[  room  ) 

John  broke  into  the  room 


(  John  )  [ entoda ) 


John  forzo  la  entrada  en  el  cuarto 


Figure  5.  Different  CLCS  Decompositions  into  English,  Spanish  and  Arabic 


The  input  to  the  generation  component  is  a  text-representation  of  a 
CLCS  in  a  format  called  longhand,  ft  is  equivalent  to  the  form  shown 
in  (f),  but  makes  certain  information  more  explicit  and  regular  (at 
the  price  of  increased  verbosity).  The  Longhand  CLCS  can  either  be  a 
fully  language-neutral  interlingua  representation,  or  one  which  stiU  in¬ 
corporates  some  aspects  of  the  source-language  interpretation  process. 
This  latter  may  include  grammatical  features  on  LCS  nodes,  but  also 
nodes,  known  as  functional  nodes,  which  correspond  to  words  in  the 
source  language  but  are  not  LCS-nodes  themselves,  serving  merely  as 
place-holders  for  feature  information.  Examples  of  these  nodes  include 
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punctuation  markers,  coordinating  conjunctions,  grammatical  aspect 
markers,  and  determiners. 

An  important  extension  of  the  LCS  input  language  is  the  in-place 
representation  of  ambiguous  sub-trees  as  a  possibles  node — denoted 
: possibles — which  has  the  various  possibilities  represented  as  its  own 
children.  For  example,  the  following  structure  (with  some  aspects  elided 
for  brevity)  represents  a  node  that  could  be  one  of  three  possibilities. 
In  the  second  one,  the  root  of  the  sub-tree  is  a  functional  node,  passing 
its  features  to  its  child,  country+: 

(2)  (:possibles 

(middle+  (country+  (developing+/p) ) ) 

(functional  (postposition  among) 

(country+  (developing+/p) ) ) 

(china+  (country+  (developing+/p) ) ) ) 

ft  is  important  to  point  out  that  in  our  Chinese- English  Translation 
project,  sentences  were  not  quite  as  simple  as  the  examples  used  so 
far  to  explain  the  LCS  approach.  Figure  6  displays  a  CLCS  from  our 
machine  translation  system  that  was  derived  from  the  Chinese  sentence 
in  (3). 

(3)  fS  H  21  IS 

in  cardinalizer  21  session  SEA-Singapore- Australia 
central-bank-organization  chief  seminar  at  , 
chinese-peoples-bank  deputy  chief  YinJieYan  concerning  ” 

m  Am  wu  T  m 

capital  large-amount  influx  situation  beneath  macro  economic 

mm  m  m  "  mm 

policy  DE  agreement  ”  issue  express  opinions 

At  the  21st  Southeast  Asia-Singapore-Macao  Central  Bank  Or¬ 
ganization  Presidents’  Symposium,  vice  president  of  the  People’s 
Bank  of  China  Yin  Jieyan  expressed  his  opinion  on  ’’coordination 
of  macro-economic  policy  with  a  large  capital  inflow” 

Figure  6  hides  the  ambiguity  in  the  CLCS  by  only  showing  a  single 
possibility  when  many  occur.  However,  ambiguous  nodes  do  indicate 
the  number  of  the  possibilities  through  the  small  black  boxes  under 
the  node.  For  example,  in  Figure  6,  the  top  node  has  four  distinct 
possibilities  corresponding  to  the  verbs  issue,  publish,  and  announce 
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(two  instances  of  the  latter).  The  nnmber  of  distinct  possible  CLCS 
representations  is  128.  The  average  nnmber  of  nodes  per  CLCS  in  this 
example  is  abont  50.  Compare  these  hgnres  to  those  for  the  example 
in  Fignre  5:  zero  ambignity,  one  CLCS,  and  ten  nodes. 


Figure  6.  Large-scale  CLCS 

The  rest  of  the  examples  in  this  paper  will  refer  to  the  less  complex 
CLCS  for  the  Chinese  sentence  in  (4). 

(4)  m  MM  ta  mp  mm 

us  nnilateral  rednce  China  textile_prodnct  export  qnota 

The  United  States  nnilaterally  rednced  the  China  textile  export 
qnota 

The  representation  for  this  example  is  shown  in  (5)  below,  which  ronghly 
corresponds  to  “The  United  States  cansed  the  qnota  (modihed  by 
China,  textile  and  export)  to  go  identihcationally  (or  transform)  to¬ 
wards  being  at  the  state  of  being  rednced.”  This  LCS  is  presented 
withont  all  the  additional  featnres,  or  type  and  fnnction  markers  for 
sake  of  clarity.  Also,  it  is  actnaUy  one  of  eight  possible  LCS  compo¬ 
sitions  prodnced  by  the  analysis  component  from  the  inpnt  Chinese 
sentence. 

(5)  (cause  (united_states+) 

(go  ident  (quota+  (china+)  (textile+)  (export+)) 

(to  ident  (quota+  (china+)  (textile+)  (export+)) 

(at  ident  (quota+  (china+)  (textile+)  (export+)) 
(reduce+ed) ) ) ) 

(with  instr  (*HEAD*)  nil) 

(unilaterally+/m) ) 
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Figure  7.  Generation  System  Architectnre 


4.  The  Generation  System 

The  architecture  of  the  generation  system  is  presented  in  Figure  7, 
showing  the  main  modules  and  sub-modules,  and  flow  of  information 
between  them.  In  the  generation  process,  the  hrst  phase.  Lexical  Choice, 
uses  language-specihc  lexicons  that  relate  lexical  items  in  the  target 
language  to  their  LCS  representation.  The  output  of  this  phase  is  a 
target-language  representation  of  the  sentence  in  a  modihed  form  of 
the  Abstract  Meaning  Representation  (AMR)  interlingua  called  LCS- 
AMR.  The  second  phase.  Realization,  hrst  handles  the  linearization  and 
morphology  to  generate  lattices  of  target-language  sequences  from  the 
LCS- AMR  and  then  statistically  extracts  preferred  sequences  using  a 
bigram  language  model.  For  linearization,  we  use  our  own  language  in¬ 
dependent  linearization  engine.  Oxygen  (Habash,  2000).  As  for  the  sta¬ 
tistical  extraction  (and  morphological  generation),  we  use  the  Nitrogen 
generation  system,  from  fSf  (Langkilde  and  Knight,  1998a;  Langkilde 
and  Knight,  1998b). 

4.1.  Lexical  Choice 

The  hrst  major  component,  divided  into  four  pipelined  sub-modules  as 
shown  in  Figure  7,  transforms  a  CLCS  structure  into  an  LCS-AMR 
structure.  This  new  representation  is  a  modihed  form  of  the  AMR 
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interlingua  that  uses  words  and  features  specific  to  the  target  language, 
and  also  includes  syntactic  and  semantic  information  from  the  LCS 
representation  that  is  relevant  for  realization. 

4.1.1.  Pre-Processing 

The  pre-processing  phase  converts  the  text  input  format  into  an  inter¬ 
nal  graph  representation  for  efficient  access  of  components  (with  links 
for  parents  as  well  as  children).  This  phase  also  removes  extraneous 
source-language  features.  For  example,  it  converts  the  CLCS  in  (2) 
to  remove  the  functional  node  and  promote  country+  to  be  one  of 
the  possible  sub-trees.  This  involves  a  top-down  traversal  of  the  tree, 
including  some  complexities  when  functional  nodes  without  children 
(which  then  assign  features  to  their  parents)  are  direct  children  of 
possibles  nodes. 

4.1.2.  Lexical  Access 

The  lexical  access  phase  compares  the  internal  CLCS  form  to  the  target 
language  lexicon,  decorating  the  CLCS  tree  with  the  RLCS  entries  of 
target-language  words  which  are  likely  to  match  sub-structures  of  the 
CLCS.  The  matching  between  a  given  CLCS,  and  the  target-language 
lexicon  is  potentially  a  complex  process,  given  the  large  amount  of 
structural  similarity  between  the  entries  of  the  lexicon.  For  example,  the 
differences  between  the  RLCS  entries  for  “run”  and  “bake”  in  class  26.6 
would  only  be  distinguished  by  looking  down  5  nodes  deep  from  the  root 
(c.L,  Figure  4  and  the  discussion  of  verb-classes  above).  In  a  previous 
version  of  the  system,  we  represented  the  lexicon  in  a  trie  structure,  so 
that  individual  entries  were  only  consulted  at  appropriate  points  in  the 
CLCS  tree-traversal.  This  stiU  proved  a  fairly  complex  and  inefficient 
procedure  given  the  large  amount  of  places  that  complex  structures  can 
be  embedded  (e.g.,  complement  events).  Our  current  approach  uses  a 
two  phase  process,  in  which  RLCS  entries  are  hrst  located  based  on 
the  distinguishing  information  (e.g.,  run+ed  vs.  bake+ed)  and  then 
placed  in  the  appropriate  matching  node  (CAUSE)  for  later  comparison. 

The  lexical  access  process  thus  proceeds  as  follows.  In  an  off-line 
lexicon  processing  phase,  each  word  in  the  target-language  lexicon  is 
stored  in  a  hash-table,  with  each  entry  keyed  on  a  designated  primitive 
which  would  be  a  most  distinguishing  node  in  the  RLCS.  Information 
is  also  kept  about  how  deep  from  the  root  of  the  RLCS  this  primitive’s 
node  is  to  be  found.  For  example,  the  designated  primitive  for  the  RLCS 
entries  corresponding  to  class  26.3  would  be  run+ed  (or  bake+ed),  and 
the  depth  would  be  5.  On-line  decoration  then  proceeds  in  two  step 
process,  recursively  examining  each  node  in  the  CLCS: 
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(6)  (i)  Look  for  RLCS  entries  stored  in  the  lexicon  nnder  the  CLCS 
node’s  primitive 

(ii)  Store  retrieved  RLCS  entries  at  the  node  in  the  CLCS  that 
matches  the  root  of  this  RLCS  (foUow  a  nnmber  of  parent 
links  from  the  CLCS  node  corresponding  to  the  depth  of  the 
designated  primitive). 


Fignre  8  shows  some  of  the  Enghsh  entries  matching  the  CLCS  in 
(5).  For  most  of  these  words,  the  designated  primitive  is  the  only  node  in 
the  corresponding  LCS  for  that  entry.  For  reduce,  however,  reduce+ed 
is  the  designated  primitive.  When  traversing  the  CLCS  nodes  in  (5), 
this  entry  will  be  retrieved  at  the  reduce+ed  node  in  step  (6)i;  it  will 
be  stored  at  the  root  node  of  (5)  in  accordance  with  step  (6)ii. 


(:DEF_W0RD  "reduce" 

: CLASS  "45. 4. a" 

:THETA_ROLES  "_ag_th, instr (with) " 

:WN_SENSE  (00154752  00162871  00163072  00163532) 

: LANGUAGE  ENGLISH 

:LCS  (event  cause  (*  thing  1) 

(event  go  ident  (*  thing  2) 

(path  toward  ident  (thing  2) 

(position  at  ident  (thing  2)  (reduce+ed  9)))) 
((*  position  with  19)  instr  (*head*)  (thing  20))) 
:VAR_SPEC  ((1  (animate  +)))) 

(:DEF_W0RD  "United  States"  :LCS  (thing  united_states+  0)) 
(:DEF_W0RD  "China"  :LCS  (thing  china+  0)) 

(:DEF_W0RD  "quota"  :LCS  (thing  quota+  0)) 

(:DEF_W0RD  "with" 

:LCS  (position  with  instr  (thing  2)  (*  thing  20))) 

(:DEF_W0RD  "unilaterally" 

:LCS  (manner  unilaterally+/m  0)) 


Figure  8.  Lexicon  entries 
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4.1.3.  Alignment/ Decomposition 

The  heart  of  the  lexical  choice  phase  is  the  decomposition  process.  In 
this  phase,  we  attempt  to  align  RLCS  entries  selected  by  the  lexical 
access  portion  with  parts  of  the  CLCS,  to  hnd  a  covering  of  the  CLCS 
graph  that  satishes  the  “fnll  coverage  constraint”  of  the  original  algo¬ 
rithm  described  in  (Dorr,  1993b).  Onr  algorithm  differs  from  that  in 
(Dorr,  1993b)  in  its  inclnsion  of  some  extensions  to  handle  the  in-place 
ambignity  represented  by  the  possibles  nodes. 

The  algorithm  recnrsively  checks  whether  CLCS  nodes  match  cor¬ 
responding  RLCS  nodes  coming  from  the  lexical  entries  retrieved  and 
stored  in  the  previons  phase.  If  signihcant  incompatibilities  are  fonnd, 
the  lexical  entry  is  discarded.  If  all  (obligatory)  nodes  in  the  RLCS 
match  against  nodes  in  the  CLCS,  then  the  rest  of  the  CLCS  is  re¬ 
cnrsively  checked  against  other  lexical  entries  stored  at  the  remaining 
nnmatched  CLCS  nodes. 

A  CLCS  node  matches  an  RLCS  node,  if  the  following  conditions 
hold: 

(7)  (i)  The  primitives  are  the  same  (or  the  primitive  for  one  is  a 
wild-card,  represented  as  nil) 

(ii)  The  types  (e.g.,  thing,  event,  state,  etc.)  are  the  same  (or  nil) 

(hi)  The  helds  (e.g.,  identihcational,  possessive,  locational,  etc)  are 
the  same  (or  nil) 

(iv)  The  positions  (e.g.,  snbject,  argnment,  or  modiher)  are  the 
same 

(v)  AU  obligatory  children  of  the  RLCS  node  have  corresponding 
matches  (recnrsively  invoking  this  same  dehnition)  to  children 
of  the  CLCS 

Star-marked  nodes  in  an  RLCS  (nodes  indicated  with  a  see  also 
discnssion  above)  reqnire  not  jnst  a  match  against  the  corresponding 
CLCS  node,  bnt  also  a  match  against  another  lexical  entry.  Thns,  in 
(5),  the  node  (united_states+)  mnst  match  not  only  with  the  cor¬ 
responding  node  from  the  RLCS  for  “rednce”  in  Fignre  8  (*  thing 
1),  bnt  also  with  the  RLCS  for  “United  States”,  united_states.  The 
resnlt  is  that  some  CLCS  nodes  mnst  match  mnltiple  RLCS  nodes. 

Snbject  and  argnment  children  are  obligatory  nnless  specihed  as  op¬ 
tional,  whereas  modihers  are  optional  nnless  specihed  as  obligatory  (see 
Fignre  2  for  an  example  of  an  optional  marking).  In  the  RLCS  for  “re¬ 
dnce”  in  Fignre  8,  the  nodes  corresponding  to  agent  and  theme  (nnm- 
bered  1  and  2,  respectively)  are  obligatory,  while  the  instrnment  (the 
node  nnmbered  19)  is  optional.  Thns,  even  thongh  there  is  no  matching 
lexical  entry  for  node  20  (“*”-marked  in  the  RLCS  for  “with”),  the  main 
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RLCS  for  “reduce”  is  allowed  to  match,  though  without  any  realization 
for  the  instrument. 

A  complexity  in  the  algorithm  occurs  when  there  are  multiple  possi¬ 
bilities  in  a  position  in  a  CLCS.  In  this  case,  only  one  of  these  possibili¬ 
ties  is  required  to  match  all  the  corresponding  RLCS  nodes  in  order  for 
a  lexical  entry  to  match.  In  the  case  where  some  of  these  possibilities 
do  not  match  any  RLCS  nodes  (meaning  there  are  no  target-language 
reahzations  for  these  constructs),  these  possibihties  can  be  pruned  at 
this  stage.  On  the  other  hand,  ambiguity  can  also  be  introduced  at 
the  decomposition  stage,  if  multiple  lexical  entries  can  match  a  single 
structure. 

The  result  of  the  decomposition  process  is  a  match-structure  indi¬ 
cating  the  hierarchical  relationship  between  all  lexical  entries  which, 
together,  cover  the  input  CLCS. 

4.f.4.  LCS-AMR  Creation 

The  match  structure  resulting  from  decomposition  is  then  converted 
into  the  appropriate  input  format  used  by  the  Nitrogen  generation 
system.  Nitrogen’s  input.  Abstract  Meaning  Representation  (AMR), 
is  a  labeled  directed  feature  graph  written  using  the  syntax  for  the 
PENMAN  Sentence  Plan  Language  (Penman  f989).  A  BNF  structural 
description  of  an  AMR  is  shown  in  (8). 

(8)  AMR  =  <concept>  |  (<label>  /  <concept>  {<role>  <AMR>}*) 

An  AMR  is  either  a  basic  concept  such  as  |run|,  |john|  or  |quicMy| 
or  a  labeled  instance  of  a  concept  that  is  modihed  by  a  set  of  feature- 
value  pairs.  Features,  or  roles,  can  be  syntactic  (such  as  : subject)  or 
semantic  (such  as  :  agent).  The  basic  notation  /  is  used  to  specify  an 
instance  of  a  concept  in  a  non-ambiguous  AMR. 

We  have  extended  the  AMR  language  to  accommodate  the  thematic 
roles  and  features  provided  in  the  CLCS  representation;  the  resulting 
representation  is  called  an  LCS-AMR.  To  distinguish  the  LCS  terms 
from  those  used  by  Nitrogen,  we  mark  most  of  the  new  roles  with  the 
prehx  :LCS-.  Figure  9  shows  the  LCS-AMR  corresponding  to  the  CLCS 
in  (5),  decomposed  using  the  lexicon  entries  in  Figure  8. 

The  LCS-AMR  in  Figure  9  can  be  read  as  an  instance  of  the  concept 
|reduce|  whose  category  is  a  verb  and  is  in  the  active  voice.  The  concept 
|reduce|  has  two  thematic  roles  related  to  it,  an  agent  (:LCS-AG)  and 
a  theme  (:LCS-TH);  and  it  is  modihed  by  the  concept  | unilaterally |. 
The  different  roles  modifying  |reduce|  come  from  different  origins.  The 
:LCS-NODE  value  comes  directly  from  the  unique  node  number  in  the 
input  CLCS.  The  category,  voice  and  telicity  are  derived  from  features 
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(a7537  /  I  reduce  I 

:LCS-NODE  6253520 
:LCS-V0ICE  ACTIVE 
:CAT  V 
:TELIC  + 

:LCS-AG  (a7538  /  lUnited  States | 
:LCS-N0DE  6278216 
:CAT  N) 

:LCS-TH  (a7539  /  | quota | 

:LCS-N0DE  6278804 
:CAT  N 

:LCS-M0D-THING  (a7540  /  | China | 
:LCS-N0DE  6108872 
:CAT  N) 

:LCS-M0D-THING  (a7541  /  | textile | 
:LCS-N0DE  6111224 
:CAT  N) 

:LCS-M0D-THING  (a7542  /  | export  | 
:LCS-N0DE  6112400 
:CAT  N)) 

:LCS-M0D-MANNER  (a7543  /  | unilaterally | 
:LCS-N0DE  6279392 
:CAT  ADV)) 


Figure  9.  LCS-AMR 


of  the  RLCS  entry  for  the  verb  |reduce|  in  the  English  lexicon.  The 
specihcations  agent  and  theme  come  from  the  RLCS  representation  of 
the  verb  reduce  in  the  English  lexicon  as  well,  as  can  be  seen  by  the 
node  nnmbers  1  and  2,  in  the  lexicon  entry  in  Fignre  8.  The  role  :LCS- 
MOD-MANNER  combines  the  fact  that  the  corresponding  AMR  had 
a  modiher  role  in  the  CLCS  and  becanse  its  type  is  a  Manner. 

We  have  additionally  extended  the  AMR  syntax  in  onr  system  by 
providing  the  ability  to  specify  an  ambignons  AMR  as  an  instance-less 
conglomeration  of  different  AMRs;  this  is  achieved  by  means  of  the 
special  role  :0R.  For  example,  a  variant  of  the  LCS-AMR  in  Fignre  9 
in  which  the  root  concept  is  three  way  ambignons  wonld  appear  as  in 
(9)  (details  below  the  root  omitted). 

(9)  (#  :0R  (#  /  I  reduce  I  .  .  .  ) 
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:0R  (#  /  I  cut  I  .  .  .  ) 

:0R  (#  /  I  decrease  I 

4.2.  Realization 

The  LCS-AMR  representation  is  then  passed  to  the  realization  niodnle, 
which  nses  the  Nitrogen  approach  to  generation.  The  strategy  nsed  by 
Nitrogen  is  to  allow  over-generation  of  possible  seqnences  of  target- 
langnage  words  from  the  ambignons  or  nnder-specihed  AMRs  and  then 
decide  amongst  them  based  on  bigram  freqnency.  The  interface  between 
the  linearization  modnle  and  the  statistical  extraction  modnle  is  a  word 
lattice  of  possible  renderings.  The  Nitrogen  package  offers  snpport  for 
both  snbtasks,  hnearization  and  statistical  extraction.  Initially,  we  nsed 
the  Nitrogen  grammar  to  do  linearization.  Bnt  complexities  in  recasting 
the  LCS-AMR  roles  as  standard  AMR  roles  as  well  as  efficiency  consid¬ 
erations  (that  will  be  discnssed  later  in  detail)  compelled  ns  to  create 
onr  own  linearization  engine  for  writing  target-langnage  grammars, 
Oxygen  (Habash,  2000). 

In  this  modnle,  we  force  linear  order  on  the  nnordered  parts  of  an 
LCS-AMR.  This  is  done  by  recnrsively  calling  grammar  rnles  that  cre¬ 
ate  varions  phrase  types  (NP,PP,  etc.)  from  aspects  of  the  LCS-AMR. 
The  resnlt  of  the  linearization  phase  is  a  word  lattice  specifying  the 
seqnence  of  words  that  make  np  the  resnlting  sentence  and  the  points 
of  ambignity  where  different  generation  paths  may  be  taken.  Example 
(10)  shows  the  word  lattice  corresponding  to  the  LCS-AMR  in  Fignre 
9. 

(10)  (SEQ  (WRD  "*start-sentence*"  BOS) 

(WRD  "united  states"  NOUN) 

(WRD  "unilaterally"  ADJ) 

(WRD  "reduced"  VERB) 

(OR  (WRD  "the"  ART) 

(WRD  "a"  ART) 

(WRD  "an"  ART)) 

(WRD  "china"  ADJ) 

(OR  (SEQ  (WRD  "export"  ADJ) 

(WRD  "textile"  ADJ)) 

(SEQ  (WRD  "textile"  ADJ) 

(WRD  "export"  ADJ))) 

(WRD  "quota"  NOUN) 

(WRD  "*end-sentence*"  EOS)) 

The  keyword  SEQ  specihes  that  what  follows  is  a  hst  of  sub-lattices  in 
their  correct  linear  order.  The  keyword  OR  specihes  the  existence  of 
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disjunctive  paths  for  generation.  In  the  above  example,  the  noun  ‘quota’ 
is  given  a  disjunction  of  all  possible  determiners  since  its  dehniteness  is 
not  specihed.  Also,  the  relative  order  of  the  words  ‘textile’  and  ‘export’ 
is  not  resolved  so  both  ordering  possibihties  are  inserted  into  the  lattice. 

Finally,  the  Nitrogen  statistical  extraction  module  evaluates  the  dif¬ 
ferent  paths  represented  in  the  word  lattice  and  orders  the  different 
word  renderings  using  uni-  and  bigram  frequencies  calculated  based  on 
two  years  of  the  Wall  Street  Journal  (Langkilde  and  Knight,  1998b). 
Example  (11)  shows  Nitrogen’s  ordering  of  the  sentences  extracted  from 
the  lattice  in  (10). 

(11)  united  states  unilaterally  reduced  the  china  textile  export  quota, 

united  states  unilaterally  reduced  a  china  textile  export  quota, 

united  states  unilaterally  reduced  the  china  export  textile  quota, 

united  states  unilaterally  reduced  a  china  export  textile  quota, 

united  states  unilaterally  reduced  an  china  textile  export  quota, 

united  states  unilaterally  reduced  an  china  export  textile  quota. 


4.2.1.  Linearization  Issues 

The  unordered  nature  of  siblings  under  an  LCS-AMR  node  compli¬ 
cates  the  mapping  between  roles  and  their  surface  positions,  yielding 
several  interesting  hnearization  issues.  In  this  section,  we  look  at  some 
of  the  choices  made  for  our  English  reahzer  for  ordering  linguistic 
constituents. 

4. 2. 1.1.  Sentential  Level  Argument  Ordering  Sentences  are  realized 
according  to  the  pattern  in  (12).  That  is,  hrst  subordinating  conjunc¬ 
tions,  if  any,  then  modihers  in  the  temporal  held  (e.g.,  “now”,  “in 
1978”),  then  the  subject,  then  most  other  modihers,  the  verb  (with 
collocations  if  any)  then  spatial  modihers  (“up”,  “down”),  then  the 
indirect  object  and  direct  object,  followed  by  prepositional  phrases  and 
relative  clauses.  Nitrogen’s  morphology  component  was  also  used,  e.g., 
to  give  tense  to  the  head  verb.  In  the  example  above,  since  there  was 
no  tense  specihed  in  the  input  LCS,  past  tense  was  used  on  the  basis 
of  the  telicity  of  the  verb  to  give  “reduced”  in  (10), (11).^ 

(12)  (SubConj  ,)  (TempMod)*  Sub  (Mod)*  V  (coll)  (SpaceMod)*  (lObj) 
(Obj)  (PP)*  (RelS)* 

^  See  (Dorr  and  Olsen,  1996)  and  (Olsen  et  al.,  2001)  for  a  detailed  stndy  on  the 
nse  of  telicity  for  tense  and  aspect  realization. 
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4. 2. 1.2.  Thematic  Role  Ordering  Given  the  above  general  shape  for 
a  sentence,  there  is  stiU  an  issne  of  which  thematic  role  shonld  be 
mapped  to  which  argnment  positions.  This  sitnation  is  comphcated  by 
the  lack  of  one-to-one  mapping  between  a  particnlar  thematic  role  and 
an  argnment  position.  For  example,  a  theme  can  be  the  snbject  in  some 
cases  and  it  can  be  the  object  in  others  or  even  an  obliqne.  Observe 
cookie  in  (13). 

(13)  (i)  John  ate  a  cookie  (object) 

(ii)  the  cookie  contains  chocolate  (snbject) 

(hi)  she  nibbled  at  a  cookie  (obhqne) 

To  solve  this  problem,  a  thematic  hierarchy  is  nsed  to  determine 
the  argnment  position  of  a  thematic  role  based  on  its  cooccnrence 
with  other  thematic  roles.  Several  researchers  have  proposed  different 
versions  of  thematic  hierarchies  (see  (Jackendoff,  1972;  Carrier-Dnncan, 
1985;  Bresnan  and  Kanerva,  1989;  Kiparsky,  1985;  Larson,  1988;  Giorgi, 
1984;  Wilkins,  1988;  Nishganchi,  1984;  Alsina  and  Mchombo,  1993; 
Baker,  1989;  Grimshaw  and  Mester,  1988)).^  Onrs  differs  from  these  in 
that  it  separates  argnments  (e.g.,  agent  and  theme)  from  obhqnes  (e.g., 
location  and  benehciary)  and  provides  a  more  complete  list  of  thematic 
roles  (30  roles  overall,  see  Fignre  3)  than  those  of  previons  approaches 
(maximnm  of  8  roles). 

The  hnal  thematic  hierarchy  for  argnments  was  extracted  by  ana¬ 
lyzing  snbcategorization  information  in  the  :THETA_ROLES  held  for  all 
the  verbs  in  onr  English  lexicon. 

(14)  special  case  :  ag  {goal  src  ben}  th 

ext  >  ag  >  instr  >  th  >  perc  >  Everything  Else 

Thns,  in  the  case  where  a  theme  occnrs  alone,  this  role  is  mapped  to 
the  hrst  argnment  position.  If  a  theme  and  an  agent  occnr,  the  agent  is 
mapped  to  hrst  argnment  position  and  the  theme  is  mapped  to  second 
argnment  position.  When  an  agent  and  theme  occnr  with  a  third  role 
that  is  either  a  goal,  a  sonrce  or  a  benehciary,  a  middle  inversion  is 
invoked  on  the  order.  The  psendo-role  ext  is  nsed  when  the  :VAR_SPEC 
held  in  the  lexical  entry  of  a  verb  inclndes  an  :EXT  marker  indicating 
that  the  verb  violates  the  normal  thematic  hierarchy.  The  ext  marker 
refers  to  an  externally  marked  thematic  role  snch  as  the  perceived 
John  in  John.perc  pleases  Maryth-  As  for  the  ordering  of  obliqnes,  all 
possible  permntations  are  generated.  For  the  LCS-AMR  in  Fignre  9, 

^  For  an  excellent  overview  and  a  comparison  of  different  thematic  hierarchies  see 
(Levin  and  Rappaport  Hovav,  1996). 
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the  thematic  hierarchy  is  what  determines  that  the  junited  states |  is  the 
subject  and  |quota|  is  the  object  of  the  verb  |reduce|.  A  more  detailed 
discussion  is  available  in  (Dorr  et  ah,  1998).  We  will  return  to  discuss 
thematic  hierarchies  later  in  this  paper  when  evaluating  English  and 
Spanish  realization. 

4. 2.1. .3.  NP  Modifier  Ordering  In  most  cases,  our  input  CLCS  repre¬ 
sentations  had  little  hierarchical  information  about  multiple  modihers 
of  a  noun.  Our  initial,  brute  force  solution  was  to  generate  all  permu¬ 
tations  and  depend  on  the  existing  statistical  extraction  (in  Nitrogen) 
to  decide  amongst  them.  This  technique  worked  well  for  noun  phrases 
of  about  6  words,  but  was  too  costly  for  larger  phrases  (of  which  there 
were  several  examples  in  our  test  corpus).  We  improved  both  the  cost  of 
permutation  generation  and  the  fluency  of  the  top  choices  by  ordering 
adjectives  within  classes,  inspired  by  the  adjective  ordering  scheme  in 
(Quirk  et  ah,  1985).  Our  classihcation  scheme  is  shown  in  (15).  Each 
adjective  in  the  target-language  lexicon  was  assigned  to  one  of  these 
classes. 

(15)  (i)  Determiner  (all,  few,  several,  some,  etc.) 

(ii)  Most  Adjectival  (important,  practical,  economic,  etc.) 

(hi)  Age  (old,  young,  etc.) 

(iv)  Color  (black,  red,  etc.) 

(v)  Participle  (confusing,  adjusted,  convincing,  decided) 

(vi)  Provenance  (China,  southern,  etc.) 

(vii)  Noun  (Bank_of_China,  difference,  memorandum,  etc.) 

(viii)  Denominal  (nouns  made  into  adjectives  by  adding  -al,  e.g., 
individual,  coastal,  annual,  etc.) 

If  multiple  words  fall  within  the  same  group,  permutations  are  gener¬ 
ated  for  them.  This  situation  can  be  seen  for  the  LCS-AMR  in  Figure  9 
with  the  ordering  of  the  modihers  of  the  word  |quota|:  |china|,  |export| 
and  |textile|.  |china|  fell  within  the  Provenance  class  of  modihers  which 
gives  it  precedence  over  the  other  two  words.  |export|  and  |textile|,  on 
the  other  hand,  fell  in  the  Noun  class  and  therefore  both  permutations 
were  passed  on  to  the  statistical  component.  Without  this  ordering, 
more  permutations  would  be  given  to  the  statistical  component,  which, 
in  this  case,  would  also  get  a  less  appropriate  result:  “Textile  china 
export  quota”  rather  than  “china  textile  export  quota.” 

4.2.2.  Oxygen:  Linearization  Implementation 

The  linearization  module  is  basically  an  implementation  of  a  set  of 
rules,  a  grammar,  that  governs  the  relative  word  ordering  (syntax) 
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and  word  form  (morphology)  of  an  LCS-AMR  in  the  target  langnage. 
We  have  nsed  three  different  linearization  modnles,  each  improving  on 
problematic  aspects  of  the  previons  ones.  We  briefly  look  at  each  of 
these  in  tnrn. 

4. 2. 2.1.  Nitrogen  Linearization  The  Nitrogen  generation  system  pro¬ 
vides  its  own  hnearization  modnle.  The  approach  nsed  in  this  modnle 
is  a  declarative  one  where  a  linearization  engine  performs  on-line  in¬ 
terpretation  of  a  linearization  grammar.  The  grammar  is  written  in 
a  special  grammar  description  langnage  that  ntihzes  two  basic  opera¬ 
tions:  recast  and  linearize.  A  recast  transforms  an  AMR  into  another 
AMR  based  on  featnres  of  the  original  AMR.  One  example  of  recasting 
is  converting  an  AMR  with  thematic  roles  into  an  AMR  with  snrface 
argnment  position  throngh  the  nse  of  a  thematic  hierarchy.  The  sec¬ 
ond  operation,  linearize,  decomposes  an  AMR  into  linearly  ordered 
constitnents,  recnrsively  applying  the  grammar  to  each.  The  grammar 
description  langnage  provides  tools  for  dehning  conditions  on  which  to 
make  decisions  to  recast  and/or  linearize  an  AMR. 

The  advantages  of  this  declarative  approach  are  rensability,  easy 
extendibihty  and  langnage  independence.  Its  main  drawback  is  speed. 
Another  drawback  for  Nitrogen’s  linearization  grammar  is  a  hmited 
and  inflexible  grammar  formalism:  First,  conditions  of  apphcation  are 
limited  to  eqnality  of  concepts  or  existence  of  roles  at  the  top  level 
of  an  AMR  only.  Second,  recasting  operations  are  limited  to  adding 
featnre-valne  pairs  and  introdncing  new  nodes.  And,  hnally,  there  is 
no  mechanism  to  perform  range-nnbonnded  or  compntationaUy  com¬ 
plex  transformations  snch  as,  for  example,  mnltiplication  or  division  to 
correctly  format  nnmbers  in  the  target  langnage.  The  hrst  two  issnes 
necessitate  writing  mnltiple  rnles  and  cascading  information  in  order 
to  implement  complex  decisions,  which  in  tnrn  increases  the  size  of  the 
grammar  and  fnrthnr  rednces  the  performance  speed.  The  third  issne  is 
simply  impossible  to  implement  with  the  cnrrent  formahsm.  A  deeper 
look  at  these  issnes  is  provided  in  (Habash,  2000). 

4. 2. 2. 2.  Procedural  Linearization  To  contrast  with  Nitrogen’s  declar¬ 
ative  approach  to  linearization,  we  look  at  procednral  implementations 
of  linearization  grammars.  In  these  approaches,  a  programming  lan¬ 
gnage  is  nsed  to  implement  the  rnles  of  the  grammar.  The  main  advan¬ 
tages  of  this  approach  are  flexibility,  power  and  speed.  Having  access 
to  the  fnU  compnting  power  of  a  programming  langnage  opens  a  lot 
of  possibilities  for  efficient  implementation.  It  also  frees  the  linearizer’s 
designer  from  the  restrictions  of  a  limited  declarative  grammar  by  pro¬ 
viding  access  to  the  operating  system,  databases,  the  web,  etc.  However, 
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a  major  disadvantage  of  this  approach  is  that  the  hnguistic  knowledge 
is  conpled  with  the  program  code.  This  hard-coding  of  grammar  rnles 
can  make  the  system  rather  rednndant,  difficnlt  to  nnderstand  and 
debng,  non-rensable  and  langnage  specihc. 

4. 2. 2. 3.  Towards  Improved  Generation:  A  Hybrid  Approach  After  ex¬ 
ploring  both  approaches  in  onr  system,  we  adopted  a  hybrid  imple¬ 
mentation  (declarative/procednral)  that  maximizes  the  advantages  and 
minimizes  the  disadvantages  of  these  paradigms.  The  resnlt  is  the 
linearization  modnle  Oxygen. 

Oxygen  nses  a  linearization  grammar  description  langnage  to  write 
declarative  grammar  rnles  which  are  then  compiled  into  a  programming 
langnage  for  efficient  performance.  Oxygen  contains  three  elements:  a 
linearization  grammar  description  langnage  (OxyL),  an  OxyL  to  Lisp 
compiler  (oxyCompile)  and  a  rnn-time  snpport  library  (oxyRnn).  Ex¬ 
cept  for  Nitrogen’s  morphological  generator  snbmodnle,  all  of  the  Oxy¬ 
gen  components  were  bnilt  at  onr  Lab.  Target-langnage  linearization 
grammars  written  in  OxyL  are  compiled  off-line  into  Oxygen  Lineariz- 
ers  nsing  oxyCompile  (Fignre  10). 


Linearization 
Grammar  , 

oxyL 


oxyCompiie 


oxyGen 

Linearizer 


Lisp 


Lisp 


Figure  10.  Oxygen  Compilation  Step 


Oxygen  Linearizers  are  Lisp  programs  that  reqnire  the  oxyRnn  li¬ 
brary  of  basic  fnnctions  in  order  to  execnte  (Fignre  If).  They  take 
AMRs  as  inpnt  and  create  word  lattices  that  are  passed  to  a  statistical 
extraction  nnit. 


AMR 


Figure  11.  Oxygen  Rnntime  Step 
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This  implementation  maximizes  the  advantages  and  minimizes  the 
disadvantages  inherent  in  the  declarative  and  procedural  paradigms. 
The  separation  of  the  linearization  engine  (oxyRun)  from  the  lineariza¬ 
tion  grammar  (OxyL)  combines  in  one  system  the  best  of  two  worlds: 

(!)  the  simplicity  and  focus  of  a  declarative  grammar  with  the  power 
and  efficiency  of  a  procedural  implementation;  and  (2)  the  efficiency 
of  a  resource-sharing  implementation.  Regarding  this  hrst  point,  the 
approach  provides  language  independence  and  reusability  since  needs 
of  the  target  language  are  only  addressed  in  its  specihc  OxyL  grammar. 
Regarding  the  second  point,  the  separation  of  language- specihc  code 
(compiled  OxyL)  from  language- independent  code  (oxyRun)  is  efficient, 
especially  when  running  multiple  linearizers  for  different  languages  at 
the  same  time  as  in  multilingual  generation. 

Moreover,  Oxygen’s  linearization  grammar  description  language,  OxyL, 
is  as  powerful  as  a  regular  programming  language  but  with  a  focus  on 
linearization  needs.  This  is  accomphshed  through  providing  powerful 
recasting  mechanisms  for  the  most  common  needs  of  a  linearization 
grammar  and  also  by  allowing  embedding  of  code  in  a  standard  pro¬ 
gramming  language  (Lisp).  This  allows  for  efficient  implementation  of 
the  more  language  specihc  reahzation  problems  (e.g.,  number  format¬ 
ting).  OxyL  hnearization  grammars  are  also  simple,  clear,  concise  and 
easily  extendible.  An  example  of  the  simplicity  of  OxyL  grammars  is  the 
reduction  of  redundancy.  For  example,  the  handhng  of  :  OR  ambiguities 
in  each  phrase  rule  (see,  e.g.,  (9))  is  hidden  from  the  linearization  gram¬ 
mar  designer  and  is  treated  only  in  the  compiler  and  support  library. 
For  a  detailed  presentation  of  OxyL’s  syntax,  see  (Habash,  2000). 

Figure  12  presents  a  small  OxyL  grammar  that  is  enough  to  linearize 
the  LCS-AMR  in  Figure  9.  In  this  grammar,  the  user-dehned  recast  op¬ 
eration  &TH-order  uses  the  OxyL  special  hierarchical  recast  operator, 

<!  to  recast  a  small  hierarchy  of  (agent,  instrument,  theme,  source 
and  goal)  into  subject  and  object  positions.  Rules  '/,S  and  '/,NP  linearize 
the  different  LCS-AMRs  associated  with  specihc  roles.  For  example, 
Osubject  refers  to  the  LCS-AMR  paired  with  the  role  :  subject.  How¬ 
ever,  note  that  since  Olcs-mod-thing  matches  three  roles  (i.e.  |  china  | , 

I  export  I  and  |  quota  | ,  an  ambiguous  LCS-AMR  is  created  and  all  its 
permutations  are  explored  linearly.  This  is  done  at  the  engine  level  and 
is  hidden  from  the  user.  A  linearization  can  specify  hard  coded  ele¬ 
ments  such  as  the  determiners  in  7,NP.  The  rule  :MainRule  determines 
which  phrase-level  rule  to  apply  by  considering  the  category,  i.e.  part 
of  speech,  of  the  LCS-AMR  instance.  This  is  accomphshed  using  the 
automatically  dehned  function  OCAT,  which  returns  the  value  associated 
with  the  held  :  CAT  in  the  LCS-AMR.  The  sequence  of  ??  X  ->  Y  ->  Z 
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roughly  corresponds  to  if  X  then  Y  else  Z.  The  rule  :MainRule  is 
applied  recursively  until  no  more  LCS-AMRs  exist. 


: Recast  &TH-order 

(Othis  <!  ((isubject  : object)  / 

(:lcs-ag  :lcs-instr  :lcs-th  :lcs-src  :lcs-goal))) 

:Rule  '/,S 

(->  (Osubject  (Oinst  +-  past)  Oobject  Olcs-mod-manner) ) 
:Rule  '/.NP 

(->  ((*or*  "a"  "an"  "the")  Olcs-mod-thing  Oinst)) 
iMainRule 

((??  (&eq  Scat  V)  ->  (do  '/,S  (&TH-order  Othis)) 

??  (aeq  Ocat  N)  ->  (do  '/.NP) 

->  (Oinst)) 


Figure  12.  K  Simple  OxyL  Grammar 


The  complete  English  Linearization  grammar  used  in  our  system  is 
much  larger  and  more  complex  than  the  one  shown  in  Figure  12.  It 
includes  14  different  phrase  structure  rules  and  four  user-dehned  recast 
operations  and  it  is  about  .300  hnes  of  code  long.  The  quahty  of  the 
English  output  produced  is  evaluated  in  section  6. 


5.  Generation  into  Multiple  Languages 

While  most  of  the  effort  has  been  spent  on  generation  into  English, 
in  the  context  of  Chinese- English  translation,  there  has  been  some 
work  using  these  components  for  generation  into  other  languages.  The 
main  algorithms  are  all  language  independent,  and  retargeting  the  sys¬ 
tem  for  another  languages  involves  only  the  following  language-specihc 
resources: 

—  Target-language  LCS  lexicon:  a  set  of  RLCS  entries  hnking  target 
language  words  to  lexical  conceptual  structures,  as  described  in 
Section  3. 

—  Target-language  linearization  grammar,  in  OxyL  (see  section  4.2.2). 

—  Word  n-gram  statistics  for  the  target  language,  for  use  by  lattice 
extractor. 
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In  addition,  the  following  pre-processing  steps  are  also  needed  for  cre¬ 
ating  a  generation  system  for  the  target  langnage: 

—  Hashing  of  target-langnage  lexicon  by  “designated  primitives”,  for 
on-line  rapid  retrieval  (see  section  4.1.2). 

—  Rnnning  oxyCompile  on  the  linearization  grammar  to  create  an 
oyxGen  Linearizer  for  the  target-langnage  (see  section  4.2.2). 

—  Creation  of  a  target-langnage  n-gram  database,  for  nse  by  the 
statistical  lattice  extractor. 

An  important  featnre  of  a  translation  approach  nsing  an  interlingna 
snch  as  LCS  is  that  the  same  grammar  can  be  nsed  for  analysis  and 
generation.  Thns  we  already  have  a  major  component  for  a  Chinese 
generation  system.  Likewise,  large  LCS  lexicons  also  exist  for  other 
langnages  snch  as  Spanish  and  Arabic  (Dorr,  1997a). 

We  have  also  created  a  linearization  component  for  Spanish,  nsing 
a  simple  OxyL  Spanish  linearization  grammar.  This  grammar  con¬ 
centrates  on  argnment  word  order  relative  to  the  verb.  It  ntihzes  a 
thematic  hierarchy  mapping  that  is  very  similar  to  that  of  English.  We 
avoided  dealing  with  complex  Spanish  morphology  by  nsing  the  simple 
‘near-fntnre’  constrnction  (va  a  +  INF).  One  example  is  alguieuag  va  a 
colocar  algofh  en  algOgoai  (someonea^  wiU  (is  going  to)  place  somethingt/j 
in  something^od/).  In  addition  to  the  lack  of  a  complete  phrase  strnctnre 
for  parts  of  speech  other  than  verbs,  the  Spanish  linearization  grammar 
doesn’t  handle  Pro-drop  or  clitics.  In  principle,  both  phenomena  can  be 
handled  with  a  recast  rnle  that  wonld  hre  after  the  thematic  hierarchy 
recast.  In  the  case  of  pro  drop,  it  conjngates  the  verb  and  makes  the 
snbject  nnll.  And  in  the  case  of  clitics,  it  adds  a  chtic  that  matches  the 
gender  and  nnmber  of  the  object. 

A  similar  bnt  even  less  sophisticated  hnearization  grammar  was  cre¬ 
ated  to  generate  Chinese.  A  preliminary  stndy  showed  some  promising 
resnlts  as  far  as  thematic  hierarchy  mapping.  However  Chinese  seems 
to  reqnire  more  complex  linearization  rnles  and  post-lexical  selection 
manipnlations  especially  for  obliqnes. 

We  have  not  yet  bnilt  an  n-gram  extractor  for  other  langnages. 
Preliminary  evalnation  of  Spanish  generation  is  given  in  Section  6.4. 


6.  Evaluation 

The  evalnation  of  machine  translation  and  natnral  langnage  generation 
systems  is  more  of  an  art  than  a  science.  Evalnation  of  generation 
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systems  is  difficult,  because  the  ultimate  criterion  is  translation  qual¬ 
ity,  which  can,  itself,  be  difficult  to  judge,  but,  moreover,  it  can  be 
hard  to  attribute  specihc  dehcits  to  the  analysis  phase,  the  lexical 
resources,  or  the  generation  system  proper.  A  wide  range  of  metrics 
and  techniques  have  been  developed  over  the  last  hfty  years  to  assess 
‘how  good’  a  system  is.  Evaluation  schemas  vary  in  their  focus  from 
addressing  the  system’s  interface  to  system  scalability,  faithfulness, 
space/time  complexity,  etc.  Another  dimension  of  variation  is  human 
versus  automatic  evaluation.  FuUy  automatic  evaluation,  a  task  that 
is  Af-complete  (i.e.,  encompassing  all  components  of  any  system  that 
would  be  deemed  “intelligent” is  the  ultimate  goal  in  the  held.'^ 

In  (Church  and  Hovy,  1991),  three  categories  of  MT  evaluation  met¬ 
rics  are  described:  system-based,  text-based  and  cost-based.  System- 
based  metrics  count  internal  resources:  size  of  lexicon,  number  of  gram¬ 
mar  rules,  etc.  These  metrics  are  easy  to  measure  although  they  are 
not  comparable  across  systems.  And  their  value  is  questionable  since 
they  are  not  necessarily  related  to  utility. 

Text-based  metrics  can  be  divided  into  sentence-based  and 
comprehensibihty-based.  Sentence- based  metrics  examine  the  quality 
of  single  sentences  out  of  context.  These  metrics  include  Accuracy, 
Fluency,  Coherence,  etc.  Typically,  subjects  evaluating  sentences  are 
given  a  description  of  the  metric  with  examples  and  are  asked  to  rate 
the  sentences  on  an  x-point  scale.  These  scales  range  from  3-point  to 
100-point.  Comprehensibility  metrics  measure  the  comprehension  or 
informativeness  of  a  complete  text  composed  of  several  sentences.  The 
subjects  are  typically  given  questionnaires  related  to  the  processed  text. 
Text-based  metrics  are  much  more  related  to  utility  than  system-based 
metrics,  but  they  are  also  much  harder  to  measure.  There  are  some 
automatic  text-based  evaluation  metrics  that  measure  the  amount  of 
post-editing  needed  for  a  sentence  given  a  gold  standard.  These  are 
variations  on  edit -distance,  i.e.,  the  number  of  deletions,  additions  or 
modihcations  measured  by  words  or  keystrokes  per  page  or  sentence. 
These  techniques  are  not  necessarily  related  to  utility,  however;  it  was 
recently  shown  that  the  smarter  tree-based  edit  distance  might  actually 
correlate  better  to  human  judgement  (Bangalore  and  Rambow,  2000). 

Cost-based  metrics  evaluate  a  system  on  how  much  money/time  it 
saves/costs  per  unit  of  text,  say  a  page.  These  are  secondary  metrics 
since  they  depend  on  other  metrics  to  evaluate  how  much  post /pre¬ 
processing  is  necessary  for  a  commercially  functional  system. 


^  For  excellent  surveys  of  machine  translation  evaluation  metrics  and  techniques, 
see  (Hovy,  1999;  Hovy,  1999). 
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Table  I.  Oxygen  Evaluation 


Procedural 

(Lisp) 

Hybrid 

(Oxygen) 

Declarative 

(Nitrogen) 

Speed 

+ 

0 

- 

Size 

0 

+ 

- 

Expressiveness 

+ 

+ 

- 

Reusability 

- 

+ 

+ 

Readability/ 
Writ  ability 

- 

+ 

- 

6.1.  Preliminary  Evaluations 

Different  aspects  of  our  system  were  evaluated  in  previous  papers.  In 
(Dorr  et  al.,  1998)  and  also  in  more  recent  work  (Habash  and  Dorr, 
2001),  the  thematic  hierarchy  implementation  proved  successful  and 
the  generation  was  demononstrated  to  be  a  diagnostic  tool  for  hxing  the 
lexicon,  algorithmic  errors,  and  inconsistencies  in  English  and  Spanish 
output. 

Another  major  evaluation  addressed  the  general  performance  of  the 
Oxygen  module  (Habash,  2000).  Oxygen  was  evaluated  based  on  speed 
of  performance,  size  of  grammar,  expressiveness  of  the  grammar  de¬ 
scription  language,  reusability  and  readability /writ  ability.  The  evalua¬ 
tion  context  is  provided  by  comparing  an  Oxygen  hnearization  gram¬ 
mar  for  English  to  two  other  implementations,  one  procedural  (using 
Lisp)  and  one  declarative  (using  Nitrogen  linearization  module).  The 
three  comparable  linearization  grammars  were  used  to  calculate  speed 
and  size.  Overall,  Oxygen  had  the  highest  number  of  advantages  and 
its  only  disadvantage,  speed,  ranked  second  to  the  lisp  implementation 
(see  Table  I). 

The  generation  component  has  also  been  used  on  a  broader  scale, 
generating  thousands  of  simple  sentences  —  at  least  one  for  each  verb 
sense  in  the  English  LCS  lexicon,  creating  sentence  templates  to  be 
used  in  a  Cross-Language  information  retrieval  system  (Dorr  et  ah, 
2000). 

These  previous  evaluation  efforts  have  been  fairly  coarse-grained  and 
subjective.  In  the  rest  of  this  section,  we  report  on  both  quantitative 
and  qualitative  evaluations  of  the  system  in  several  dimensions:  Trans- 
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lation  Quality,  Coverage  and  Retargetability.  Translation  Quality  can 
be  seen  as  a  system  depth  evaluation  whereas  Coverage  is  a  system 
breadth  evaluation.  Retargetability  focuses  on  the  extendibility  of  the 
system  to  other  languages. 

6.2.  Translation  Quality  Evaluation 

The  generation  system  has  been  used  as  part  of  a  Chinese- English 
Translation  system  focusing  on  a  corpus  of  10  newspaper  articles  from 
Xinhua  (Chinese  People’s  Daily).  The  articles  included  eighty  sen¬ 
tences  that  our  translation  system  was  able  to  parse,  compose  into 
LCS  interlingua,  and  generate  into  Enghsh  successfully.  Although  the 
number  of  sentences  is  small,  some  of  them  are  quite  complex,  and 
represent  a  cross-section  of  the  types  of  complex  phenomena  handled  in 
a  large-scale  MT.  To  measure  the  translation  quality  of  the  system,  we 
performed  two  human  evaluations:  one  for  Accuracy  (Fidelity)  and  one 
for  Fluency  (Intelligibility).  Both  tests  used  a  set  of  25  sentences  ran¬ 
domly  selected  from  the  80  original  Chinese  sentences  that  completely 
pass  our  translation  system.  For  comparison  purposes,  we  also  used 
a  commercial  Chinese- Enghsh  translation  system  to  translate  these 
sentences:  Chinese- English  Systran  .3.0  Professional  edition.  Thus,  we 
both  have  absolute  quahty  metrics  and  compare  to  state  of  the  art 
translation. 

The  test  suite  is  a  2x2  grid:  (Accuracy,  Fluency)  x  (ChinMT,  Sys¬ 
tran).  The  total  number  of  subjects  is  80,  ah  of  whom  are  native 
speakers  of  English.  Each  subject  participated  in  only  one  of  the  four 
possible  evaluations  (e.g.,  ChinMT  Accuracy  or  Systran  Fluency)  for 
all  25  sentences.®  The  evaluation  was  performed  online  using  a  web 
interface  (see  Figure  13). 

6.2.1.  Accuracy  Evaluation 

This  evaluation  measures  the  Accuracy  or  Fidelity  of  the  translation 
system,  i.e.,  how  well  a  system  preserves  the  meaning  of  the  original 
text  whether  the  target  language  is  fluent  or  not.  The  subjects  were 
given  25  pairs  of  sentences.  Each  pair  consists  of  a  human  translation 
of  the  Chinese  original  and  a  machine  translated  version.  Subjects  were 
asked  to  rate  the  translation  accuracy  on  a  5-point  scale  (see  table  11).® 

A  score  of  5  is  given  where  the  content  of  the  original  sentence  is  fully 
conveyed  (might  need  minor  corrections).  A  score  of  1  is  given  where 

^  To  avoid  order  bias  that  can  result  from  degradation  in  subject  performance 
over  time,  each  grid  cell  has  two  versions  with  different  sentence  display:  (1  to  25) 
and  (13  to  25,  1  to  12) 

^  Loosely  based  on  Nagao’s  7-point  scale  for  Fidelity  (Nagao,  1989) 
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d. 

Netscape:  Chinese-Enqiish  Machine  Translation  Accuracy  Evaluation 

J- 

J 

File 

Edit  View  Communicator 

Help 

Bookmarks  ^  Location;  'Jittp  : //hanne.  umiacs.  umd.  edu:  lllll/chinmt-eval/mt-tq-a  /|  What's  Related 

MT-TQ-A;cmti 

i« 

Original  Sentence  (Human  Translation) 

Madiine  Translation 

Score 

The  inflow'  of  foreign  capital  readied  149.5  billion  US  dollars. 

die  foreign  investment  in- flow  extendedl49.5  billion  us 

5 

4 

1 

accounting  for  65  percent  of  the  total  inflow  of  foreign  capital  over 

doUais ,  then  the  foreign  investment  in- flow  constituted 

3 

die  last  seventeen  years. 

foreign  investment  in-flow  come  of  17  years  total  65  . 

2 

1 

ITus  is  another  function  of  the  automated  training  management 

this  is  again  another  function  of  the  training  automated 

5 

4 

2 

system  successfully  developed  by  the  military  training  department 

management  system  that  die  beijing  military  region 

3 

of  the  Beijing  Military  Region. 

mibtaiy  training  department  successfully  developed . 

2 

1 

|3 

These  systems  have  achieved  remarkable  performance  in  the 
training  management  effort,  have  showm  prospects  for  widespread 
dissemination,  and  have  received  high  praise  trom  headquarters 
organizations. 

several  systems  gave  a  free  rein  on  die  training 
management  effort  to  a  remarkable  action ,  then  several 
systems  showed  out  the  vast  spread  prospect,  dien 
several  systems  received  a  general  headquarters  units 
hi^  opinion . 

5 

4 

3 

2 

1 

<; 

1 

i:i|  --it  ’ii  lal?' 

SI  \«.|| 

Figure  13.  MT  Evaluation  Interface:  Accuracy 


Table  II.  Accuracy  Criteria 


5 

contents  of  original  sentence  conveyed  (might  need  minor  corrections) 

4 

contents  of  original  sentence  conveyed  BUT  errors  in  word  order 

3 

contents  of  original  sentence  generally  conveyed  BUT  errors  in  relation¬ 
ship  between  phrases,  tense,  singular/plural,  etc. 

2 

contents  of  original  sentence  not  adequately  conveyed,  portions  of 
original  sentence  incorrectly  translated,  missing  modfiers 

I 

contents  of  original  sentence  not  conveyed,  missing  verbs,  subjects, 
objects,  phrases  or  clauses 

the  content  of  the  original  sentence  is  not  conveyed  at  all.  An  earlier 
pilot  stndy  indicated  that  snbjects  had  a  hard  time  with  descriptions  of 
the  scale  and  preferred  examples  instead.  Thns  snbjects  were  provided 
with  a  table  containing  two  mannally  constrncted  examples  per  score  to 
illnstrate  the  idea  behind  the  scoring  scheme  (see  table  Iff).  Fignre  13 
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displays  a  screen  capture  of  the  web  interface  showing  the  hrst  three 
pairs  of  sentences  in  an  Accuracy  evaluation  form. 


Table  III.  Accuracy  Scale  Examples 


Original  Sentence  (Hnman  Translation) 

The  United  States  unilaterally  reduced  China’s  textile  export  quotas. 

Machine  Translation  Score 

-united  states  reduced  china’s  textile  export  quota  unilaterally. 

-united  states  reduced  china  textile  export  quota  unilaterally. 

5 

-united  states  cut  china  quota  export  textile  unilaterally  down. 

-united  states  china  quota  export  textile  cuts  down  unilaterally  down. 

4 

-united  states  down  to  slash  of  a  export  textile  Chinese  the  quotas, 
-some  states  united  slash  down  reducingly  down  china  textile  of  export 
ration. 

3 

-beautiful  folk  slashed  porcelain  export  on  own  way. 

-state  reduce  quota. 

2 

-it  cut. 

1 

-china. 

6.2.2.  Fluency  Evaluation 

In  the  fluency  evaluation,  the  subjects  were  given  25  machine  translated 
sentences.  The  purpose  of  this  evaluation  is  to  measure  the  Fluency  (or 
Intelligibility)  of  the  translation  system.  Subjects  were  asked  to  rate 
the  Fluency  of  machine  translated  sentences  on  a  5-point  scale  that 
is  loosely  based  on  Nagao’s  intelligibility  scale  metric  (Nagao,  1989). 
The  scale  ranges  from  5  (clear  meaning,  fluent  sentence)  to  1  (meaning 
absolutely  unclear,  sentence  not  fluent).  Table  IV  details  the  criteria 
used  in  measuring  fluency.  We  are  aware  that  Fluency  and  Intelligibility 
are  not  the  same.  What  we  were  looking  for  is  a  composed  metric  that 
includes  both.  Table  V  describes  the  examples  given  to  the  subjects  to 
help  them  understand  and  use  the  scale.  The  actual  evaluation  input 
looked  like  the  examples  provided  in  Figure  13  without  the  hrst  column. 
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Table  IV.  Fluency  Criteria 


5 

clear  meaning,  good  grammar,  terminology  and  sentence  structure 

4 

clear  meaning  BUT  bad  grammar,  bad  terminology  or  bad  sentence 
structure 

3 

meaning  graspable  BUT  ambiguities  due  to  bad  grammar,  bad  termi¬ 
nology  or  bad  sentence  structure 

2 

meaning  unclear  BUT  inferable 

1 

meaning  absolutely  unclear 

Table  V.  Fluency  Scale  Examples 


Machine  Translation  Score 

-the  united  states  unilaterally  reduced  china’s  textile  export  quotas. 

-the  united  states  unilaterally  reduced  china  textile  export  quotas. 

5 

-united  states  cutted  china  export  textile  ration  lonely. 

-united  states  reduce  down  china  quota  textile  export. 

4 

-united  states  reduce  an  quotas  export  textiling  of  the  porcelain  for  the 
only  busy  a  decision. 

-a  Chinese  ration  united  states  cut  it  down. 

3 

-states  united  unilateral  cut  an  china  textile  speaks  ration  downwardly 
down. 

-cause  states  go  quotas  to  reduced. 

2 

-beautiful  folk  remedy  partage  china  exportation  filament  on  own 
shaving. 

-alone  cut  it  up  rations  alone. 

1 

6.2.3.  Translation  Quality  Evaluation  Results 

The  results  of  the  evaluation  are  presented  in  table  VI.  The  number 
in  each  cell  represents  the  average  score  given  by  all  subjects  on  all 
sentences  for  each  evaluation.  ChinMT  did  slightly  better  than  Systran 
but  the  difference  is  statistically  insignihcant .  Overall,  the  scores  given 
show  an  average  performance  for  both  systems,  glossed  as  follows:  for 
Accuracy,  contents  of  original  sentence  generally  conveyed  BUT  er- 
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rors  in  relationship  between  constituents  (cf  Table  II)  and  for  Fluency, 
meaning  graspable  BUT  ambiguities  exist  (cf  Table  IV). 

Our  system  was  able  to  perform  as  well  as  a  commercial  system  that 
took  many  person-years  to  develop.  Systran  3.0  Professional  Edition 
Chinese- English  MT  system  is  the  result  of  an  estimated  20  person- 
years  of  work.^  It  utilizes  a  large  lexicon  of  150,000  root  stems,  6,000  ex¬ 
pressions,  1-2,000  Cantonese  terms,  2500  Names,  a  300,000  word  safety 
net  lexicon  (CETA  dictionary)  and  an  optional  2K  military  terms.  With 
this  coverage,  the  system’s  strength  is  in  military,  computer  science,  and 
electronics  domains. 

As  for  our  system,  it  was  developed  over  6  person-years.  The  Enghsh 
LCS  lexicon  includes  about  12,000  entries,  of  which  9,500  are  verbs  and 
900  are  prepositions.  The  remaining  1,200  are  nouns  and  adjectives, 
which  may  be  dynamically  generated  based  on  specihc  domain  needs. 

Since  our  system  is  interhngual,  all  of  its  resources  are  readily  ex¬ 
tendible  for  use  with  other  languages  for  both  Analysis  and  Generation. 
A  case  in  point  is  a  previous  project  for  Language  Tutoring  using  LCS 
resources  was  retargeted  from  Arabic  to  Spanish  in  l/6th  the  time  it 
took  to  build  the  original  project  (Dorr,  1997b). 


6.2.4.  Analysis  of  Translation  Quality  Results 

For  the  most  part,  the  Nitrogen  strategy  of  over-generating  transla¬ 
tion  hypotheses  coupled  with  selection  according  to  bigram  likelyhoods 
(Langkilde  and  Knight,  1998a),  works  very  well.  There  are  some  difficul¬ 
ties  that  can  be  seen  as  responsible  for  the  average  scores  received.  One 
major  issue  is  that,  especially  with  the  bigram  language  model’s  bias  for 
shorter  sentences,  fluency  is  given  preference  over  translation  accuracy. 
Thus,  if  there  is  some  material  that  is  considered  optional  (e.g.,  by 
the  decomposition  process),  and  there  are  lattice  entries  both  with 
and  without  this  information,  the  extractor  will  tend  to  pick  the  path 
without  this  information.  While  this  technique  is  also  very  successful 
at  picking  out  more  fluent,  terse  formulations  (e.g.,  “John  went  to  the 
bank”  rather  than  “John  went  to  at  the  bank”,  or  “convincing  proof” 
rather  than  “proof  having  convincingness”),  further  work  is  needed  to 
assess  the  right  ratio  of  terseness  vs  informativeness.  Also,  bigrams  are 
obviously  inadequate  for  capturing  long-distance  dependencies,  and  so, 
if  things  like  agreement  are  not  carefully  controlled  in  the  symbolic 
component,  they  will  be  incorrect  in  some  cases. 


^  This  estimate  is  computed  based  on  information  provided  through  personal 
communication  with  Mr  Dale  Bostad  from  NAIC  (National  Air  Intelligence  Center), 
the  agency  that  sponsored  the  development  of  this  product. 
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Table  VI.  Chinese-English  Translation  Qnality  Resnlts 


LCS-based  MT 

Systran  3.0  Professional 

Accuracy 

3.08 

3.01 

Fluency 

3.15 

3.12 

Table  VII.  CLCS  Test  Corpns  Examples 


Class 

Example 

2 

someonCag  wanted  somethingth  (to  do  somethingt;i)prop 

10.5 

someonCag  stole  somethingth  from  somethingsrc  for  somethingi,en 

22.1.C 

someonCag  mixed  somethingth  into  somethinggoai 

29.1.B 

someoncth  considered  somethingperc  (to  be  somepropertypred)inod-pred 

45. 2. A 

someonCag  folded  somethingth  with  somethingj„st 

55.1.C 

someoneth  continued  (to  do  somethingt;i)prop 

6.3.  Coverage  Evaluation 

For  this  evaluation,  a  test  corpus  of  453  simple  CLCS  representations 
corresponding  to  all  LVD  classes  was  constructed  semi-automatically.® 
The  size  of  the  test  corpus  guarantees  large-scale  coverage  over  verb 
behavior  and  thematic  role  combinations,  which  is  exhaustive  for  our 
purpose.  The  CLCS  representations  were  constructed  by  randomly  se¬ 
lecting  an  LCS  verb  entry  from  each  class  from  the  English  verb  class 
and  hlling  all  its  argument  positions  with  simple  noun  phrases  (e.g. 
somethingth,  someoneag,  etc.)  or  simple  subordinate  clauses  (e.g.  (to 
do  somethingjprop,  (to  be  someproperty)mod-prop:  etc.)  Table  Vff  shows 
some  sample  English  sentences  corresponding  to  the  CLCS  representa¬ 
tions  in  the  test  corpus. 

For  this  evaluation,  statistical  extraction  was  disabled  to  evaluate 
the  whole  range  of  possible  outputs  generated  by  the  system.  For  ex¬ 
ample,  each  of  the  two  subclasses  dehning  the  dative  alternations  for 
the  verb  send  are  expected  to  generate  both  alternations  (i.e.  Jejhn  sent 

®  Currently,  the  number  of  classes  in  LVD  is  492.  But  at  the  time  of  conducting 
this  evaluation,  there  were  only  453  classes. 
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a  book  to  Paul  and  John  sent  Paul  a  book).  Out  of  453  input  CLCS 
representations,  25  failed  the  lexical  selection  process  due  to  problems 
with  lexicon  entries.  In  the  remaining  cases,  the  lexical  selection  process 
appropriately  generated  multiple  sentences  for  each  CLCS.  All  of  these 
correctly  corresponded  to  various  related  alternations  of  the  main  verb. 
However,  there  were  also  cases  of  overgeneration  resulting  from  preposi¬ 
tion  under-specihcation,  which  is  inconsequential  to  our  evaluation  (e.g. 
go  (to, toward, towards, to  at, etc.)  somewhere).  The  average  number  of 
sentences  generated  per  class  was  4. 

6.3.1.  Coverage  Evaluation  Criteria 

The  results  of  generation  were  passed  to  a  speaker  of  English  who  was 
asked  to  mark  sentences  as  being  acceptable  or  not  acceptable  on  three 
criteria:  (1)  argument  generation,  (2)  prepositional  phrase  generation, 
and  (3)  word  order.  Acceptable  argument  generation  is  dehned  as  the 
generation  of  all  arguments  of  the  verb  whether  pure  arguments  or 
obliques.  Acceptable  prepositional  phrase  generation  is  dehned  as  the 
generation  of  good  proposition  choices  such  as  goal  prepositions  ver¬ 
sus  source  prepositions  with  an  oblique  goal  and  the  generation  of  a 
prepositional  object.  Finally,  acceptable  word  order  is  word  order  that 
rehects  the  correct  relation  of  the  arguments  to  the  verb. 

6.3.2.  Coverage  Evaluation  Results 

Table  Vlff  displays  the  results  of  this  evaluation.  The  hrst  row  repre¬ 
sents  the  number  and  ratio  of  classes  that  generated  no  correct  output 
for  each  error  criterion.  Some  classes  generated  both  correct  and  in¬ 
correct  outputs.  These  are  counted  as  correct  with  the  assumption 
that  given  a  good  statistical  extractor,  the  correct  answer  would  rank 
highest.  The  second  row  is  an  estimate  of  the  percentage  of  unsuccessful 
generation  of  verb  senses,  where  the  raw  class  results  are  weighted  by 
the  number  of  verbs  in  each  class.  On  average  each  class  contains  21 
verbs,  but  since  some  classes  have  more  verbs  in  them  than  others,  this 
second  line  seems  a  more  appropriate  measure  to  evaluate  coverage 
over  the  full  lexicon  (estimating  actual  verbs  covered  rather  than  verb 
classes).  Another  useful  metric  might  be  to  normalize  based  on  the 
probabihty  of  occurrence  of  verbs,  giving  more  weight  to  frequently 
occurring  verbs.  But  this  is  a  much  more  complicated  task  because 
it  requires  a  corpus  that  tags  verb  senses  with  their  appropriate  LCS 
structures. 

The  results  of  this  evaluation  are  quite  encouraging  in  that  they  show 
a  high  percentage  of  coverage  over  the  LCS  lexicon.  Argument  errors 
and  word-order  errors  were  due  to  incorrect  lexical  entries.  For  example, 
in  the  case  of  word-order  errors,  specihc  realization  information  such 
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Table  VIII.  Coverage  Evaluation 


N  =  428 

Argument  Error 

Preposition  Error 

Word  Order  Error 

Class-based 

6  (1%) 

53  (12%) 

5  (1%) 

Verb- based 

1% 

9% 

3% 

as  :EXT  was  missing  from  some  entries.  This  problem  appeared  in 
three  subclasses  of  class  41.3.1  (Simple  Verbs  of  Dressing:  don,  doff 
and  wear).  In  our  lexicon,  clothes,  the  object  for  all  three  verbs,  is 
considered  the  theme  and  the  subject  of  the  sentence  is  the  goal,  source 
and  location  respectively.  Fixing  these  cases  is  a  matter  of  adding  the 
appropriate  piece  of  information  in  the  lexicon.  Preposition  errors  are 
more  severe  in  that  complete  entries  for  some  prepositions  were  not 
found  in  the  lexicon.  These  errors  will  be  hxed  once  the  proper  entries 
have  been  added.  The  generation  system  has  thus  been  quite  helpful 
as  a  diagnostic  tool  for  determining  errors  and  inconsistencies  in  the 
Lexicon. 

6.4.  Retargetability 

Finally,  we  examine  the  generation  system’s  language  independence. 
For  this  evaluation  task  we  used  as  input  the  same  corpus  of  simple 
CLCS  entries  developed  for  the  coverage  evaluation  presented  in  the 
previous  section,  however  we  replaced  the  Enghsh  generation  system 
with  the  Spanish  one  described  in  Section  5. 

For  the  purposes  of  this  evaluation,  statistical  extraction  was  dis¬ 
abled  because  we  do  not  have  a  Nitrogen  bigram  model  for  Spanish 
and  because  we  wanted  to  examine  the  range  of  alternations  produced. 

The  results  of  the  generation  were  passed  to  a  speaker  of  Spanish 
to  evaluate  in  a  similar  manner  to  the  evaluation  done  for  coverage. 
One  extra  criterion  in  this  evaluation  is  a  check  on  sense  generation 
correctness,  i.e.,  whether  this  Spanish  verb  is  a  proper  translation  of 
the  Enghsh  verb  given  the  argument  structure  presented  in  the  verb 
class. 

As  in  the  case  of  the  Enghsh  generation  results  presented  in  the 
previous  section,  some  of  the  Spanish  sentences  failed  the  lexical  se¬ 
lection  process  due  to  problems  with  lexicon  entries.  However,  there 
were  many  more  sentences  that  were  produced,  which  should  not  have 
been  generated  in  Spanish.  In  theory,  the  lexical  selection  process  limits 
the  number  of  choices  using  the  LCS  entry  of  the  Spanish  verbs.  But 
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Table  IX.  Retargetability  Evaluation 


N  =  254 

Argument  Error 

Preposition  Error 

Word  Order  Error 

Class-based 

15  (6%) 

85  (33%) 

4  (2%) 

Verb- based 

10% 

44% 

0% 

that  process  is  only  as  good  as  the  lexicon  entries.  In  cases  where 
a  bad  sense  is  generated,  the  sentence  involved  is  dropped  from  the 
evalnation.  Most  failnres  in  Spanish  generation  are  dne  to  missing 
verb  entries  (29%  of  all  inpnt  classes).  Erroneons  lexicon  entries  were 
responsible  for  another  10%  of  generation  failnres.  And  an  additional 
5%  of  classes  were  dropped  ont  of  the  evalnation  becanse  there  was  no 
correct  sense  ontpnt.  As  a  resnlt  only  254  ont  of  453  classes  (56%)  have 
been  evalnated  on  argnment,  preposition  and  word-order  correctness. 

Table  IX  displays  the  resnlts  of  this  evalnation.  The  hrst  row  repre¬ 
sents  the  nnmber  and  ratio  of  classes  that  generated  no  correct  ontpnt 
for  each  error  criterion.  The  second  row  represents  the  same  ratios 
inclnding  class  verb  connt  as  weights. 

The  Spanish  ontpnt  is  not  as  clean  as  the  English  ontpnt:  it  has 
more  overgeneration,  more  failnres,  and  a  higher  error  rate  (except  for 
word  order  errors).  Argnment  errors  are  dne  to  lexicon  entries  that  were 
incorrect  or  missing.  Most  of  preposition  errors  were  dne  to  incorrect 
overgeneration  resnlting  from  extra  incorrect  entries  which  were  added 
to  the  lexicon  antomatically  and  were  not  mannally  checked. 

A  recent  analysis  of  the  Spanish  lexicon  indicates  that  160  ont  of 
453  semantic  verb  classes  (abont  35%)  reqnire  re-verihcation  for  in¬ 
consistencies  that  resnlted  dnring  the  process  of  porting  the  classes 
from  English  to  Spanish.  (See  (Dorr,  1997a)  for  more  details  of  the 
porting  process.)  However,  the  focns  of  this  evalnation  was  not  on  the 
qnality  or  coverage  of  Spanish  in  onr  system.  It  was  on  the  ease  of 
extendibihty  of  the  system  to  another  langnage.  And  given  this  crite¬ 
rion,  this  evalnation  is  qnite  positive  since  the  amonnt  of  work  that  was 
needed  was  minimal:  the  Spanish  lexicon  already  existed  for  analysis 
pnrposes  and  the  OxyL  grammar  was  created  in  a  short  period  of  time. 
Of  conrse  improving  the  qnality  of  the  system  will  need  more  work  on 
both  frontiers:  the  lexicon  and  the  hnearization  grammar.  There  will 
also  be  a  role  to  play  in  statistical  extraction  of  best  generated  sentence, 
especially  for  cases  of  overgeneration  that  inclnded  both  good  and  bad 
resnlts. 
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7.  Conclusions  and  Future  Work 

We  have  presented  a  system  for  Natural  Language  generation  from 
Lexical  Conceptual  Structures,  including  situating  the  generation  sys¬ 
tem  within  a  larger  machine  translation  effort,  as  well  as  evaluation 
of  some  key  components  of  the  results.  The  system  has  been  used 
both  to  generate  very  long,  complex,  multiply  ambiguous  sentences 
(outputs  of  Chinese  to  English  Translations),  as  well  as  thousands  of 
simple  sentence  templates  (spanning  the  whole  of  the  English  verb  and 
preposition  lexicons).  Evaluation  of  the  quality  and  correctness  of  both 
modes  has  been  carried  out,  showing  comparable  translation  quality 
with  a  commercial  translation  system.  The  generation  system  can  also 
be  straightforwardly  extended  to  other  languages,  given  appropriate 
target-language  specihc  resources  (lexicon  and  grammar),  and  this  has 
been  demonstrated  and  evaluated  for  Spanish. 

As  well  as  its  utihty  for  generating  target-language  sentences,  the 
generation  system  also  provides  a  crucial  step  in  the  development  cycle 
for  analysis  and  lexicon  resources.  Changes  to  a  current  lexicon,  both 
extensions  and  corrections,  which  might  be  done  either  manually  or 
using  an  automatic  acquisition  method  can  be  evaluated  based  on  how 
they  will  affect  generation  of  sentences  into  that  language.  This  has 
been  a  valuable  diagnostic  tool  for  discovering  both  specihc  errors  and 
lacunae  in  lexicon  coverage. 

The  biggest  remaining  step  is  a  more  careful  evaluation  of  different 
sub-systems  and  preference  strategies  to  more  efficiently  process  very 
ambiguous  and  complex  inputs,  without  substantially  sacrihcing  trans¬ 
lation  quality.  Also  a  current  research  topic  is  how  to  combine  other 
metrics  coming  from  various  points  in  the  generation  process  with  the 
bigram  statistics,  to  result  in  better  overall  outputs. 
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